Genes and Athletic Performance: The 2023 Update

Phenotypes of athletic performance and exercise capacity are complex traits influenced by both genetic and environmental factors. This update on the panel of genetic markers (DNA polymorphisms) associated with athlete status summarises recent advances in sports genomics research, including findings from candidate gene and genome-wide association (GWAS) studies, meta-analyses, and findings involving larger-scale initiatives such as the UK Biobank. As of the end of May 2023, a total of 251 DNA polymorphisms have been associated with athlete status, of which 128 genetic markers were positively associated with athlete status in at least two studies (41 endurance-related, 45 power-related, and 42 strength-related). The most promising genetic markers include the AMPD1 rs17602729 C, CDKN1A rs236448 A, HFE rs1799945 G, MYBPC3 rs1052373 G, NFIA-AS2 rs1572312 C, PPARA rs4253778 G, and PPARGC1A rs8192678 G alleles for endurance; ACTN3 rs1815739 C, AMPD1 rs17602729 C, CDKN1A rs236448 C, CPNE5 rs3213537 G, GALNTL6 rs558129 T, IGF2 rs680 G, IGSF3 rs699785 A, NOS3 rs2070744 T, and TRHR rs7832552 T alleles for power; and ACTN3 rs1815739 C, AR ≥21 CAG repeats, LRPPRC rs10186876 A, MMS22L rs9320823 T, PHACTR1 rs6905419 C, and PPARG rs1801282 G alleles for strength. It should be appreciated, however, that elite performance still cannot be predicted well using only genetic testing.

Starting in the late 1990s, research began to identify DNA polymorphisms associated with predisposition to certain types of sports and exercise-related phenotypes, with initial focus on variants of the ACE, ACTN3, AMPD1, PPARA, PPARD, and PPARGC1A genes [13][14][15][16][17][18][19][20][21][22][23]. Initially, most research was conducted using the candidate gene approach [24][25][26][27][28][29], which limited progress in the discovery of new genetic markers associated with exercise-and sport-related phenotypes [30]. In addition to the fact that this approach studies only a single genetic variant in isolation, most candidate gene studies in the field of sports genomics are limited by sample size. This is a potential source of type I error (false positive findings), underpinning why replication of positive associations in independent cohorts is essential. Case-control study designs in sports genomics. In this approach, allelic frequencies are compared between athletes and controls (e.g., endurance athletes vs. untrained subjects or endurance vs. power athletes). A case-control study may be the first step followed by a genotype-phenotype study (e.g., identification of VO2max or weightlifting performance-increasing genotypes among athletes). In some cases, studies begin with a genotype-phenotype approach, and the findings are subsequently validated by a case-control study.
Another approach that has proven effective in addressing the possibility of false positive results in sports genomics literature is to perform replication studies in two or more independent athletic cohorts (even with small or moderate sample sizes), followed by a meta-analysis to quantify the overall effect of a polymorphism on athlete status and/or a sport-and exercise-related trait [43,[52][53][54][55][56][57][58][59][60][61][62][63][64][65]. However, in some cases, replication is not possible due to the exclusivity of a polymorphism to specific populations based on their geographic ancestry. For example, the rs671 G/A polymorphism of the aldehyde dehydrogenase 2 (ALDH2) gene was associated with strength in athletes and non-athletes from the Japanese population [66][67][68]. Interestingly, the unfavourable (associated with reduced strength) rs671 A allele is not present in Europeans or South Asians (frequency 0%), but common in Chinese, Japanese, and Vietnamese populations (15-25%). This demonstrates a notable challenge seeking to replicate genomic findings in larger samples, as increasing the study sample must also consider the geographic ancestry of participants. This also Figure 1. Case-control study designs in sports genomics. In this approach, allelic frequencies are compared between athletes and controls (e.g., endurance athletes vs. untrained subjects or endurance vs. power athletes). A case-control study may be the first step followed by a genotype-phenotype study (e.g., identification of VO 2max or weightlifting performance-increasing genotypes among athletes). In some cases, studies begin with a genotype-phenotype approach, and the findings are subsequently validated by a case-control study.
Another approach that has proven effective in addressing the possibility of false positive results in sports genomics literature is to perform replication studies in two or more independent athletic cohorts (even with small or moderate sample sizes), followed by a meta-analysis to quantify the overall effect of a polymorphism on athlete status and/or a sport-and exercise-related trait [43,[52][53][54][55][56][57][58][59][60][61][62][63][64][65]. However, in some cases, replication is not possible due to the exclusivity of a polymorphism to specific populations based on their geographic ancestry. For example, the rs671 G/A polymorphism of the aldehyde dehydrogenase 2 (ALDH2) gene was associated with strength in athletes and non-athletes from the Japanese population [66][67][68]. Interestingly, the unfavourable (associated with reduced strength) rs671 A allele is not present in Europeans or South Asians (frequency 0%), but common in Chinese, Japanese, and Vietnamese populations (15-25%). This demonstrates a notable challenge seeking to replicate genomic findings in larger samples, as increasing the study sample must also consider the geographic ancestry of participants. This also highlights the possibility that the genetic determinants of some sport-and exerciserelated phenotypes are restricted to certain populations, demonstrating that increasing sample size is not as straightforward as simply recruiting participants from multiple countries and/or continents.
Technological advancement has lowered the cost of conducting genomic studies, increasing accessibility to researchers who wish to investigate the genetic underpinnings of sport and exercise phenotypes. Consequently, sports genomics is a dynamic and continually developing field, making it important to regularly appraise the contribution of recent advances to the field. Therefore, the aim of the current review was to summarise recent progress in understanding the genetic determinants of athlete status, and to detail novel DNA polymorphisms that may underpin differences between individuals in their athletic potential.
At the time of writing (end of May 2023), the total number of DNA polymorphisms associated with athletic performance since the first discovery in 1998 is 251 ( Figure 2). Our search for sports genomics publications was based on journals indexed in major databases (i.e., PubMed etc.) using specific key words (e.g., athletes + polymorphism/genotype etc.). However, not all articles were included in the current review due to language limitations (articles written in languages other than English must contain at least abstracts in English). In addition, papers with very small cohort (less than 25 in athletes/controls), or articles with combined groups of athletes (for example, endurance + power without separation) were not included. Abstracts of conference proceedings were not considered. In recognition of the fact that many studies in the field of sports genomics report associations based on the investigation of small sample sizes, we stipulated that only markers where statistically significant associations have been reported in at least two studies (two case-control studies and/or one case-control plus one functional study; including those presented in one article) would be included in the present review. highlights the possibility that the genetic determinants of some sport-and exercise-related phenotypes are restricted to certain populations, demonstrating that increasing sample size is not as straightforward as simply recruiting participants from multiple countries and/or continents. As well as the phenotypes of athlete status or competitive performance, several recent studies have investigated a broader range of traits which may relate directly or indirectly to athletic capability. These include flexibility, coordination, cardiorespiratory fitness, spatial ability, stress resilience, mental toughness, fat loss efficiency, and cardiovascular and metabolic responses to training, amongst others [69][70][71][72][73][74][75][76][77][78][79][80][81][82][83][84]. For example, combat athletes are more likely than untrained subjects to have the warrior (COMT rs4680 GG) genotype [85], whilst chess players demonstrate an increased frequency of an allele linked to improved memory and spatial ability (KIBRA rs17070145 T) [86]. Such discoveries demonstrate the broadening nature of sports genomics in recent times, with focus expanding from the traditional domain of investigating what makes elite performers different from the general population into other domains, such as sports nutrigenetics [87][88][89][90][91][92][93][94][95][96][97][98][99][100] and areas of sports medicine, such as genomic variants associated with soft-tissue injuries and sports-related concussion [101][102][103][104][105][106][107][108][109][110][111][112][113][114].
Technological advancement has lowered the cost of conducting genomic studies, increasing accessibility to researchers who wish to investigate the genetic underpinnings of sport and exercise phenotypes. Consequently, sports genomics is a dynamic and continually developing field, making it important to regularly appraise the contribution of recent advances to the field. Therefore, the aim of the current review was to summarise recent progress in understanding the genetic determinants of athlete status, and to detail novel DNA polymorphisms that may underpin differences between individuals in their athletic potential.
At the time of writing (end of May 2023), the total number of DNA polymorphisms associated with athletic performance since the first discovery in 1998 is 251 ( Figure 2). Our search for sports genomics publications was based on journals indexed in major databases (i.e., PubMed etc.) using specific key words (e.g., athletes + polymorphism/genotype etc.). However, not all articles were included in the current review due to language limitations (articles written in languages other than English must contain at least abstracts in English). In addition, papers with very small cohort (less than 25 in athletes/controls), or articles with combined groups of athletes (for example, endurance + power without separation) were not included. Abstracts of conference proceedings were not considered. In recognition of the fact that many studies in the field of sports genomics report associations based on the investigation of small sample sizes, we stipulated that only markers where statistically significant associations have been reported in at least two studies (two case-control studies and/or one case-control plus one functional study; including those presented in one article) would be included in the present review.  and had fewer negative or controversial findings) include AMPD1 rs17602729 C, CDKN1A rs236448 A, HFE rs1799945 G, MYBPC3 rs1052373 G, NFIA-AS2 rs1572312 C, PPARA rs4253778 G, and PPARGC1A rs8192678 G alleles for endurance; ACTN3 rs1815739 C, AMPD1 rs17602729 C, CDKN1A rs236448 C, CPNE5 rs3213537 G, GALNTL6 rs558129 T, IGF2 rs680 G, IGSF3 rs699785 A, NOS3 rs2070744 T, and TRHR rs7832552 T alleles for power; and ACTN3 rs1815739 C, AR ≥ 21 CAG repeats, LRPPRC rs10186876 A, MMS22L rs9320823 T, PHACTR1 rs6905419 C, and PPARG rs1801282 G alleles for strength. This update on the panel of genetic markers associated with athlete status covers advances in research reported in the past two years (previous online version was published in 2021 [115]). The current review also lists all known markers associated with endurance, power, or strength athlete status/performance. This article does not aim to review genetic markers associated with team (game) and combat sports, markers for which are well described elsewhere [26,61,116,117].

Gene Variants for Endurance Athlete Status
An individual's endurance capacity is determined by many factors, including their muscle fibre typology, haemoglobin mass, mitochondrial biogenesis, maximal cardiac output, and maximal rate of oxygen consumption (VO 2max ), among others [118][119][120][121][122][123][124]. Indeed, there is evidence that these intermediate phenotypes have a substantial genetic influence, with literature indicating that genetic factors account for up to 70% of the variability in endurance-related traits [125]. Usually, genetic markers associated with endurance athlete status are determined by comparing allelic frequencies between endurance athletes (e.g., biathletes, road cyclists etc.) and controls.
To support the observed findings from endurance-related case-control studies, researchers subsequently perform functional, lab-based studies to determine the relationship between genotypes and physiological measures. Examples of measurements used to complement genomic studies include (but are not limited to) VO 2max , forced expiratory volume in one second (FEV1), proportion of slow-twitch muscle fibres, recovery speed, long-distance running performance, running economy, lactate threshold, erythropoietin and haemoglobin levels, number of erythrocytes, capillary density, mitochondrial density, fat metabolism, and fatigue resistance.

Gene Variants for Power Athlete Status
Several characteristics are positively associated with power performance, including circulating levels of testosterone, percentage and cross-sectional area of fast-twitch muscle fibres, muscle mass and strength, body and calcaneus height, muscle fascicle length, and reaction time, among others [3,[238][239][240][241][242][243][244]. The heritability of power-related phenotypes has been reported in the literature to range from approximately 49 to 86% in a range of phenotypes, including jumping ability [245,246]. Typically, genetic markers associated with power athlete status are determined by comparing allelic frequencies between power athletes (e.g., 100 m runners, shot putters, arm wrestlers, etc.) and untrained subjects. To support findings from case-control studies, investigators perform genotype-phenotype studies by measuring sprint times, jump performance, muscle fibre size, muscle fibre typology, maximal strength, rate of force development, and circulatory levels of anabolic hormones such as testosterone. Our literature search revealed that at least 45 of the 95 markers reportedly associated with power athlete status met our new criteria ( Table 2). The most promising of these genetic markers associated with power athlete status currently include ACTN3 rs1815739 C, AMPD1 rs17602729 C, CDKN1A rs236448 C, CPNE5 rs3213537 G, GALNTL6 rs558129 T, IGF2 rs680 G, IGSF3 rs699785 A, NOS3 rs2070744 T, and TRHR rs7832552 T alleles. In contrast, the remaining 50 genetic markers (power alleles) did not meet our strict criteria: rs660339 C, VEGFR2 rs1870377 T, WAPL rs4934207 C, and ZNF423 rs11865138 C. The majority of these markers are reported in previous reviews [25, 126,127] and should be validated in additional studies before they can meet the criteria to be included in our list of power-associated genetic variants.

Gene Variants for Strength Athlete Status
Performance in strength-based sports is based on multiple factors. However, the factors considered to contribute substantially to strength phenotypes include skeletal muscle hypertrophy (muscle fibre size), hyperplasia, the predominance of fast-twitch muscle fibres, a greater muscle fascicle pennation angle, improved neurological adaptation, high glycolytic capacity, and increased circulatory testosterone [297]. Importantly, evidence exists that strength athletes exhibit vastly different transcriptomic, biochemical, anthropometric, physiological, and biomechanical characteristics compared to endurance athletes and/or controls [1,4]. These differences can be explained by the presence of both deliberate environmental (training, nutrition, etc.) and genetic factors. Indeed, studies indicate that there is a strong heritability of power-and strength-related traits, where genetic factors account for up to 85% of the variation in maximal isometric, isotonic, and isokinetic strength [246]. In a recent study investigating the genetic component of severe sarcopenia (the age-related decline in skeletal muscle mass, strength, and gait speed) [37], it was found that the alleles associated with higher risk of severe sarcopenia were closely linked to tiredness, alcohol intake, smoking, time spent watching television, and a higher self-reported consumption of salt and processed meat. In contrast, alleles associated with lower risk of severe sarcopenia were positively associated with levels of serum testosterone, IGF1, and 25-hydroxyvitamin D; height; physical activity; as well as indicators of healthier dietary habits (self-reported intake of cereal, cheese, oily fish, protein, water, fruit, and vegetables). Whilst muscle strength phenotypes in the general population may be less pronounced than in strength athletes, the latter may represent an ideal population to identify genomic variants associated with skeletal muscle capacity, potentially aiding the advancement of knowledge surrounding sarcopenia and directing strategies to reduce the negative impact of age-related declines in muscle mass. In general, genetic markers associated with strength athlete status can be determined by comparing allelic frequencies between strength athletes and controls. To support these findings, scientists perform genotype-phenotype studies by measuring handgrip and isokinetic strength, powerlifting/weightlifting performance, as well as evaluating the acute and chronic responses to resistance training.
Previously, 170 DNA polymorphisms were reported to be associated with handgrip strength in three large GWASs [34-36]. In a follow-up study involving elite weightlifters and powerlifters, Moreland et al.
[51] tested the hypothesis that alleles associated with greater handgrip strength would be over-represented in these athletes compared to controls. Accordingly, they identified 23 DNA polymorphisms that were associated with strength athlete status. Of these SNPs, the LRPPRC rs10186876, MMS22L rs9320823, and PHACTR1 rs6905419 polymorphisms were also associated with superior competitive weightlifting performance [298].
Our literature search based on our new inclusion criteria revealed at that least 42 genetic markers could be associated with strength athlete status ( Table 3). The most promising genetic markers for strength athlete status include ACTN3 rs1815739 C, AR ≥ 21 CAG repeats, LRPPRC rs10186876 A, MMS22L rs9320823 T, PHACTR1 rs6905419 C, and PPARG rs1801282 G alleles.

Conclusions
The current review demonstrates that at least 251 genetic markers are reportedly linked to sport-related traits. However, only 128 (51%) of these markers (41 endurance-related, 45 power-related, and 42 strength-related) have been associated with athlete status in two or more studies. On the other hand, of these 128 genetic markers, the significance of 29 (22.7%) DNA polymorphisms was not replicated in at least one study, raising the possibility that a number of findings may represent false positives. It is important to consider that there may be one of several reasons why the findings of a study may not be replicated by another, including disparity of sample sizes, small sample sizes in one or more of the studies, different study designs, inconsistent classification of sporting groups or types of sport (strength, power, etc.), variability in how researchers or research groups define the term "elite athletes" (some researchers define the term "elite" as performances at the international level, others if the athlete is a prize winner in international competitions), and the ethnicity/geographical ancestry of the cohorts studied, amongst others.
As discussed previously, height remains not only the most studied exercise-related phenotype at the genetic level, but also the most studied human trait, with 12,111 associated SNPs [31]. It is estimated that the final number of height-related SNPs may reach 25,000 (with a minor allele frequency of ≥1%), but the sample size needs to be increased to approximately 100 million individuals of the same ethnicity. These values should be noteworthy and serve as a benchmark for the direction of future research in the field of sports genomics, where the current number of 251 genetic markers must be increased by a considerable magnitude in order to fully comprehend the genomic underpinnings of exercise performance, and thus to be considered as potential predictors of talent in sport. Given that effective talent identification remains a challenging task despite decades of research and strategy [322][323][324], it remains possible that the development of predictive genetic performance tests in future may be able to contribute to the advancement of this field. However, the literature currently available does not support the use of genetic testing for these purposes [325][326][327].
Whilst genomics is the among the most established molecular sub-disciplines of sport and exercise research, sport-and exercise-related DNA polymorphisms do not fully explain the heritability of athlete status. Consequently, other forms of variation, such as rare mutations [328,329] and epigenetic markers (i.e., stable and heritable changes in gene expression) [330], must be considered. Newly emerging high-throughput technologies enable the design of multi-omics approaches integrating various -omics levels (metabolomics, transcriptomics, proteomics, epigenomics, etc.) with the aim of determining how each level contributes to the biological mechanisms underpinning physical performance. For example, transcriptomic analyses have revealed the roles of both genomic and epigenomic mechanisms in modulating the transcription of genes regulated by exercise [2,331,332]. Incorporating multi-omics approaches has the potential to drastically advance the understanding of how the acute response to exercise is regulated, and consequently how chronic adaptations to exercise are mediated in the context of elite performance and/or health and wellbeing. Accordingly, future research, including collaborative multicentre GWASs and whole-genome sequencing of large athlete cohorts with further validation and replication, as well as the use of large purpose-built Biobanks, should focus on identifying genetic and other -omics markers of sport-related phenotypes and their underlying biology.
Our review does have limitations. First, we have not provided information regarding genetic markers associated with team (game) and combat sports, flexibility, coordination, personality, cognitive abilities, muscle fibre composition, skeletal muscle hypertrophy, injuries, and responses to training/supplements. These markers are well described elsewhere [4,24,26,28,37,61,74,79,[115][116][117]333,334]. Second, we have not described all studies in detail (ethnicity, specific sporting disciplines, sample size, p-values etc.) given word limit. Third, some genetic markers (out of the 128 most significant) were selected based on data obtained in case-control studies only, without confirmation of functional significance (genotype-phenotype studies are therefore warranted).
In conclusion, our literature search revealed at least 251 DNA polymorphisms that could be associated with endurance, power, and strength athlete statuses. Most of these genetic markers have been discovered in studies involving Australian, Brazilian, British, Canadian, Chinese, Croatian, Czech, Ethiopian, Finnish, French, German, Greek, Hungarian, Indian, Iranian, Israeli, Italian, Jamaican, Japanese, Kenyan, Korean, Lithuanian, Polish, Qatari, Russian, Slovenian, South African, Spanish, Taiwanese, Tatar, Tunisian, Turkish, Ukrainian, and US athletes.